22 research outputs found
Tackling the Curse of Dimensionality in Large-scale Multi-agent LTL Task Planning via Poset Product
Linear Temporal Logic (LTL) formulas have been used to describe complex tasks
for multi-agent systems, with both spatial and temporal constraints. However,
since the planning complexity grows exponentially with the number of agents and
the length of the task formula, existing applications are mostly limited to
small artificial cases. To address this issue, a new planning algorithm is
proposed for task formulas specified as sc-LTL formulas. It avoids two common
bottlenecks in the model-checking-based planning methods, i.e., (i) the direct
translation of the complete task formula to the associated B\"uchi automaton;
and (ii) the synchronized product between the B\"uchi automaton and the
transition models of all agents. In particular, each conjuncted sub-formula is
first converted to the associated R-posets as an abstraction of the temporal
dependencies among the subtasks. Then, an efficient algorithm is proposed to
compute the product of these R-posets, which retains their dependencies and
resolves potential conflicts. Furthermore, the proposed approach is applied to
dynamic scenes where new tasks are generated online. It is capable of deriving
the first valid plan with a polynomial time and memory complexity w.r.t. the
system size and the formula length. Our method can plan for task formulas with
a length of more than 60 and a system with more than 35 agents, while most
existing methods fail at the formula length of 20. The proposed method is
validated on large fleets of service robots in both simulation and hardware
experiments.Comment: 9 pages, 9 figure
Time Minimization and Online Synchronization for Multi-agent Systems under Collaborative Temporal Tasks
Multi-agent systems can be extremely efficient when solving a team-wide task
in a concurrent manner. However, without proper synchronization, the
correctness of the combined behavior is hard to guarantee, such as to follow a
specific ordering of sub-tasks or to perform a simultaneous collaboration. This
work addresses the minimum-time task planning problem for multi-agent systems
under complex global tasks stated as Linear Temporal Logic (LTL) formulas.
These tasks include the temporal and spatial requirements on both independent
local actions and direct sub-team collaborations. The proposed solution is an
anytime algorithm that combines the partial-ordering analysis of the underlying
task automaton for task decomposition, and the branch and bound (BnB) search
method for task assignment. Analyses of its soundness, completeness and
optimality as the minimal completion time are provided. It is also shown that a
feasible and near-optimal solution is quickly reached while the search
continues within the time budget. Furthermore, to handle fluctuations in task
duration and agent failures during online execution, an adaptation algorithm is
proposed to synchronize execution status and re-assign unfinished subtasks
dynamically to maintain correctness and optimality. Both algorithms are
validated rigorously over large-scale systems via numerical simulations and
hardware experiments, against several strong baselines.Comment: 17 pages, 14 figure
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Existing text-video retrieval solutions are, in essence, discriminant models
focused on maximizing the conditional likelihood, i.e., p(candidates|query).
While straightforward, this de facto paradigm overlooks the underlying data
distribution p(query), which makes it challenging to identify
out-of-distribution data. To address this limitation, we creatively tackle this
task from a generative viewpoint and model the correlation between the text and
the video as their joint probability p(candidates,query). This is accomplished
through a diffusion-based text-video retrieval framework (DiffusionRet), which
models the retrieval task as a process of gradually generating joint
distribution from noise. During training, DiffusionRet is optimized from both
the generation and discrimination perspectives, with the generator being
optimized by generation loss and the feature extractor trained with contrastive
loss. In this way, DiffusionRet cleverly leverages the strengths of both
generative and discriminative methods. Extensive experiments on five commonly
used text-video retrieval benchmarks, including MSRVTT, LSMDC, MSVD,
ActivityNet Captions, and DiDeMo, with superior performances, justify the
efficacy of our method. More encouragingly, without any modification,
DiffusionRet even performs well in out-domain retrieval settings. We believe
this work brings fundamental insights into the related fields. Code is
available at https://github.com/jpthu17/DiffusionRet.Comment: Accepted by ICCV 202
Optical Flow Sensor/INS/Magnetometer Integrated Navigation System for MAV in GPS-Denied Environment
The drift of inertial navigation system (INS) will lead to large navigation error when a low-cost INS is used in microaerial vehicles (MAV). To overcome the above problem, an INS/optical flow/magnetometer integrated navigation scheme is proposed for GPS-denied environment in this paper. The scheme, which is based on extended Kalman filter, combines INS and optical flow information to estimate the velocity and position of MAV. The gyro, accelerator, and magnetometer information are fused together to estimate the MAV attitude when the MAV is at static state or uniformly moving state; and the gyro only is used to estimate the MAV attitude when the MAV is accelerating or decelerating. The MAV flight data is used to verify the proposed integrated navigation scheme, and the verification results show that the proposed scheme can effectively reduce the errors of navigation parameters and improve navigation precision
AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models
Information-seeking conversation, which aims to help users gather information
through conversation, has achieved great progress in recent years. However, the
research is still stymied by the scarcity of training data. To alleviate this
problem, we propose AutoConv for synthetic conversation generation, which takes
advantage of the few-shot learning ability and generation capacity of large
language models (LLM). Specifically, we formulate the conversation generation
problem as a language modeling task, then finetune an LLM with a few human
conversations to capture the characteristics of the information-seeking process
and use it for generating synthetic conversations with high quality.
Experimental results on two frequently-used datasets verify that AutoConv has
substantial improvements over strong baselines and alleviates the dependence on
human annotation. In addition, we also provide several analysis studies to
promote future research.Comment: Accepted to ACL 2023 Main Conference (Short
Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation is typically inspired by class
activation maps, which serve as pseudo masks with class-discriminative regions
highlighted. Although tremendous efforts have been made to recall precise and
complete locations for each class, existing methods still commonly suffer from
the unsolicited Out-of-Candidate (OC) error predictions that not belongs to the
label candidates, which could be avoidable since the contradiction with
image-level class tags is easy to be detected. In this paper, we develop a
group ranking-based Out-of-Candidate Rectification (OCR) mechanism in a
plug-and-play fashion. Firstly, we adaptively split the semantic categories
into In-Candidate (IC) and OC groups for each OC pixel according to their prior
annotation correlation and posterior prediction correlation. Then, we derive a
differentiable rectification loss to force OC pixels to shift to the IC group.
Incorporating our OCR with seminal baselines (e.g., AffinityNet, SEAM,
MCTformer), we can achieve remarkable performance gains on both Pascal VOC
(+3.2%, +3.3%, +0.8% mIoU) and MS COCO (+1.0%, +1.3%, +0.5% mIoU) datasets with
negligible extra training overhead, which justifies the effectiveness and
generality of our OCR.Comment: Accepted to CVPR202
NewsDialogues: Towards Proactive News Grounded Conversation
Hot news is one of the most popular topics in daily conversations. However,
news grounded conversation has long been stymied by the lack of well-designed
task definition and scarce data. In this paper, we propose a novel task,
Proactive News Grounded Conversation, in which a dialogue system can
proactively lead the conversation based on some key topics of the news. In
addition, both information-seeking and chit-chat scenarios are included
realistically, where the user may ask a series of questions about the news
details or express their opinions and be eager to chat. To further develop this
novel task, we collect a human-to-human Chinese dialogue dataset
\ts{NewsDialogues}, which includes 1K conversations with a total of 14.6K
utterances and detailed annotations for target topics and knowledge spans.
Furthermore, we propose a method named Predict-Generate-Rank, consisting of a
generator for grounded knowledge prediction and response generation, and a
ranker for the ranking of multiple responses to alleviate the exposure bias. We
conduct comprehensive experiments to demonstrate the effectiveness of the
proposed method and further present several key findings and challenges to
prompt future research.Comment: Accepted to ACL 2023 Conference (Long Paper; Findings